51 research outputs found

    Discovering Dialog Rules by means of an Evolutionary Approach

    Get PDF
    Designing the rules for the dialog management process is oneof the most resources-consuming tasks when developing a dialog system. Although statistical approaches to dialog management are becoming mainstream in research and industrial contexts, still many systems are being developed following the rule-based or hybrid paradigms. For example, when developers require deterministic system responses to keep total control on the decisions made by the system, or because the infrastructure employed is designed for rule-based systems using technologies currently used in commercial platforms. In this paper, we propose the use of evolutionary algorithms to automatically obtain the dialog rules that are implicit in a dialog corpus. Our proposal makes it possible to exploit the benefits of statistical approaches to build rule-based systems. Our proposal has been evaluated with a practical spoken dialog system, for which we have automatically obtained a set of fuzzy rules to successfully manage the dialog.The research leading to these results has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 823907 (MENHIR project:https://menhir-project.eu

    An approach to develop intelligent learning environments by means of immersive virtual worlds

    Get PDF
    Merging Immersive Virtual Environments, Natural Language Processing and Artificial Intelligence techniques provides a number of advantages to develop Intelligent Environments for multiple applications. This paper is focused on the application of these technologies to develop intelligent learning environments. Education is one of the most interesting applications of immersive virtual environments, as their flexibility can be exploited in order to create heterogeneous groups from all over the world who can collaborate synchronously in different virtual spaces. We highlight the potential of virtual worlds as an educative tool and propose a model to create learning environments within Second Life or OpenSimulator combining the Moodle learning management system, embodied conversational metabots, and programmable 3D objects. Our proposal has been applied in several subjects of the Computer Science degree in the Carlos III University of Madrid. The results of the evaluation show that developed learning environment fosters engagement and collaboration and helps students to better understand complex concepts.Spanish Government TEC2012-37832-C02-01Consejo Interinstitucional de Ciencia y Tecnologia (CICYT) TEC2011-28626-C02-02Project CAM CONTEXTS S2009/TIC-148

    A Neural Network Approach to Intention Modeling forUser-Adapted Conversational Agents

    Get PDF
    Spoken dialogue systems have been proposed to enable a more natural and intuitive interaction with the environment andhuman-computer interfaces. In this contribution, we present a framework based on neural networks that allows modeling of theuser’s intention during the dialogue and uses this prediction todynamically adapt the dialoguemodel of the system taking intoconsideration the user’s needs and preferences. We have evaluated our proposal to develop a user-adapted spoken dialogue systemthat facilitates tourist information and services and provide a detailed discussion of the positive influence of our proposal in thesuccess of the interaction, the information and services provided, and the quality perceived by the users

    A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset

    Get PDF
    The work leading to these results was supported by the Spanish Ministry of Science and Innovation through the projects GOMINOLA (PID2020-118112RB-C21 and PID2020-118112RB-C22, funded by MCIN/AEI/10.13039/501100011033), CAVIAR (TEC2017-84593-C2-1-R, funded by MCIN/AEI/10.13039/501100011033/FEDER "Una manera de hacer Europa"), and AMIC-PoC (PDC2021-120846-C42, funded by MCIN/AEI/10.13039/501100011033 and by "the European Union "NextGenerationEU/PRTR"). This research also received funding from the European Union's Horizon2020 research and innovation program under grant agreement No 823907 (http://menhir-project.eu, accessed on 17 November 2021). Furthermore, R.K.'s research was supported by the Spanish Ministry of Education (FPI grant PRE2018-083225).Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy results were achieved when we fine-tuned the whole model by appending a multilayer perceptron on top of it, confirming that the training was more robust when it did not start from scratch and the previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models. Results showed that sequential models beat static models by a narrow difference. Error analysis reported that the visual systems could improve with a detector of high-emotional load frames, which opened a new line of research to discover new ways to learn from videos. Finally, combining these two modalities with a late fusion strategy, we achieved 86.70% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. Results demonstrated that these modalities carried relevant information to detect users’ emotional state and their combination allowed to improve the final system performance.Spanish Government PID2020-118112RB-C21 PID2020-118112RB-C22 MCIN/AEI/10.13039/501100011033 TEC2017-84593-C2-1-R MCIN/AEI/10.13039/501100011033/FEDER PDC2021-120846-C42European Union "NextGenerationEU/PRTR")European Union's Horizon2020 research and innovation program 823907German Research Foundation (DFG) PRE2018-08322

    Fine-Tuning BERT Models for Intent Recognition Using a Frequency Cut-Off Strategy for Domain-Specific Vocabulary Extension

    Get PDF
    The work leading to these results was supported by the Spanish Ministry of Science and Innovation through the R& D&i projects GOMINOLA (PID2020-118112RB-C21 and PID2020118112RB-C22, funded by MCIN/AEI/10.13039/501100011033), CAVIAR (TEC2017-84593-C2-1-R, funded by MCIN/ AEI/10.13039/501100011033/FEDER "Una manera de hacer Europa"), and AMICPoC (PDC2021-120846-C42, funded by MCIN/AEI/10.13039/501100011033 and by "the European Union "NextGenerationEU/PRTR"). This research also received funding from the European Union's Horizon2020 research and innovation program under grant agreement No 823907 (http://menhirproject.eu, accessed on 2 February 2022). Furthermore, R.K.'s research was supported by the Spanish Ministry of Education (FPI grant PRE2018-083225).Intent recognition is a key component of any task-oriented conversational system. The intent recognizer can be used first to classify the user’s utterance into one of several predefined classes (intents) that help to understand the user’s current goal. Then, the most adequate response can be provided accordingly. Intent recognizers also often appear as a form of joint models for performing the natural language understanding and dialog management tasks together as a single process, thus simplifying the set of problems that a conversational system must solve. This happens to be especially true for frequently asked question (FAQ) conversational systems. In this work, we first present an exploratory analysis in which different deep learning (DL) models for intent detection and classification were evaluated. In particular, we experimentally compare and analyze conventional recurrent neural networks (RNN) and state-of-the-art transformer models. Our experiments confirmed that best performance is achieved by using transformers. Specifically, best performance was achieved by fine-tuning the so-called BETO model (a Spanish pretrained bidirectional encoder representations from transformers (BERT) model from the Universidad de Chile) in our intent detection task. Then, as the main contribution of the paper, we analyze the effect of inserting unseen domain words to extend the vocabulary of the model as part of the fine-tuning or domain-adaptation process. Particularly, a very simple word frequency cut-off strategy is experimentally shown to be a suitable method for driving the vocabulary learning decisions over unseen words. The results of our analysis show that the proposed method helps to effectively extend the original vocabulary of the pretrained models. We validated our approach with a selection of the corpus acquired with the Hispabot-Covid19 system obtaining satisfactory results.Spanish Ministry of Science and Innovation (MCIN/AEI) PID2020-118112RB-C21 PID2020118112RB-C22 PDC2021-120846-C42Spanish Ministry of Science and Innovation (MCIN/AEI/FEDER "Una manera de hacer Europa") TEC2017-84593-C2-1-RSpanish Ministry of Science and Innovation (European Union "NextGenerationEU/PRTR") PDC2021-120846-C42European Commission 823907German Research Foundation (DFG) PRE2018-08322

    An application of conversational systems to promote healthy lifestyle habits

    Get PDF
    Recent reports indicate the multiple benefits of adopting conversational systems for healthcare, providing automation of management tasks in overburdened healthcare systems, improved patient experiences, and better efficiency and productivity rates. In this paper, we describe a conversational system aimed at promoting healthy lifestyle habits related to nutrition and physical exercise. The system has been developed using Google’s Dialogflow platform and the combination of different APIs and data repositories in the cloud. It has been integrated in the Facebook Messenger instant messaging platform. The results of the preliminary assessment of the system through a subjective questionnaire show the high degree of satisfaction of the users with the functionalities provided and the help offered to allow fulfilling the predefined objectives

    Mobile Conversational Interface for Stuttering Treatment

    Get PDF
    Los dispositivos móviles se han convertido en una herramienta de uso diario que ha provocado cambios en las formas de interactuar de las personas gracias a la accesibilidad que ofrecen a grandes contenidos de información y a servicios de comunicación de forma continua y prácticamente ubicua. La combinación de estos dispositivos y los interfaces conversacionales ha facilitado el desarrollo de aplicaciones en el campo de la salud móvil (mHealth en inglés) cada vez más avanzadas. En este trabajo presentamos una app que utiliza el potencial de los interfaces conversacionales para ayudar a mejorar la fluidez en el habla en personas que padecen tartamudez.Mobile devices have become a tool for daily use that has caused changes in people's ways of interacting, thanks to the accessibility they offer to large contents of information and communication services continuously, regardless of the place and the moment. Thanks to the development of mobile applications, advanced apps have been developed in the area of mobile health (mHealth), a field of eHealth in which the practice of medicine and public health is supported by mobile devices. In this paper we present an app that uses the potential of conversational interfaces for the development of mobile applications that help improve speech fluency in people who suffer from stuttering

    Coordination of Speech Recognition Devices in Intelligent Environments with Multiple Responsive Devices

    Get PDF
    Devices with oral interfaces are enabling new interesting interaction scenarios and ways of interaction in ambient intelligence settings. The use of several of such devices in the same environment opens up the possibility to compare the inputs gathered from each one of them and perform a more accurate recognition and processing of user speech. However, the combination of multiple devices presents coordination challenges, as the processing of one voice signal by different speech processing units may result in conflicting outputs and it is necessary to decide which is the most reliable source. This paper presents an approach to rank several sources of spoken input in multi-device environments in order to give preference to the input with the highest estimated quality. The voice signals received by the multiple devices are assessed in terms of their calculated acoustic quality and the reliability of the speech recognition hypotheses produced. After this assessment, each input is assigned a unique score that allows the audio sources to be ranked so as to pick the best to be processed by the system. In order to validate this approach, we have performed an evaluation using a corpus of 4608 audios recorded in a two-room intelligent environment with 24 microphones. The experimental results show that our ranking approach makes it possible to successfully orchestrate an increasing number of acoustic inputs, obtaining better recognition rates than considering a single input, both in clear and noisy settings.This research has received funding by the project DEP2015-70980-R of the Spanish Ministry of Economy and Competitiveness (MINECO) and European Regional Development Fund (ERDF), the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 823907 (‘Mental health monitoring through interactive conversations’, MENHIR Project), as well as, received inputs from the COST Action IC1303 AAPEL

    ChatSubs: A dataset of dialogues in Spanish, Catalan, Basque and Galician extracted from movie subtitles for developing advanced conversational models

    Get PDF
    The ChatSubs dataset [5] contains dialogue data in Spanish and three of Spain’s co-official languages (Catalan, Basque, and Galician). It has been obtained from OpenSubtitles, from which we have gathered the movie subtitles in our languages of interest and processed them to generate clearly segmented dialogues and their turns. The data processing code is pub- licly accessible. The result is 206.706 JSON files with more than 20 million dialogues and 96 million turns, which rep- resents one of the biggest dialogue corpus available, as other similar datasets in better resourced languages do not reach 500k dialogues or present less defined conversations. Thus, the ChatSubs dataset is an ideal resource for research teams that are interested in training dialogue models in Spanish, Catalan, Basque, and GalicianCONVERSA ( TED2021-132470B-I00 ) funded by MCIN/AEI/10.13039/50110 0 011033European Union NextGenerationEU/PRT

    Using Language Technologies and Virtual Worlds to Develop Educative Applications

    Get PDF
    Los continuos avances en el desarrollo de tecnologías de la información han dado lugar actualmente a la posibilidad de acceder a contenidos educativos en la red desde prácticamente cualquier lugar, cualquier momento y de forma casi instantánea. Sin embargo, la accesibilidad no suele considerarse como criterio principal en el diseño de aplicaciones educativas, especialmente para facilitar su utilización por parte de personas con discapacidad. Diferentes tecnologías han surgido recientemente para fomentar la accesibilidad a las nuevas tecnologías y dispositivos móviles, favoreciendo una comunicación más natural con los sistemas educativos. En este artículo se describe un Proyecto de Innovación Docente en el que se propone el uso innovador de los Sistemas Multiagente, los Sistemas de Diálogo y los Mundos Virtuales para el desarrollo de una plataforma educativa.Continuous advances in the development of information technologies have currently led to the possibility of accessing learning contents from anywhere, at any time and almost instantaneously. However, accessibility is not always the main objective in the design of educative applications, specifically to facilitate their use by disabled people. Different technologies have recently emerged to foster the accessibility of computers and new mobile devices favouring a more natural communication between the student and the developed educative systems. This paper describes an Educational Innovation Project focused on the application of Multiagent Systems, Spoken Dialog Systems, and Virtual Worlds to develop an educative platform.Trabajo parcialmente financiado por los proyectos TRA2011-29454-C03-03, MINECO TEC2012-37832-C02-01, CICYT TEC2011-28626-C02-02 y CAM CONTEXTS (S2009/TIC-1485). Desarrollado en el marco del Proyecto “Aplicación de nuevas metodologías docentes para un mejor aprovechamiento de las clases prácticas” (11a Convocatoria de Apoyo a Experiencias de Innovación Docente en Estudios de Grado y Postgrado en la UC3M)
    corecore